In this capstone project, I am utilizing the skills and tools I learnt from Coursera courses. I have selected New York city for the project. I am helping the stakeholders to narrow the location for their new business.
In this project, we are assisting a big Indian restaurant chain to open a new restaurant on a foreign land. Currently, our stakeholders own around 200 restaurants in India and they want to expand their business by opening a new and the first restaurant in the New york City. We have to find a solution for stakeholder to open Indian restaurant chain in the city New york, USA.
Since there are lots of restaurants in New York city, we will find the locations that are not crowded with Indian restaurants and we are also interested in areas with less Indian restaurants.
The stakeholders are not only interested in the location to open new chains; they are also interested in the place where they can make good profit. So, we are helping them to find a location where the areas are crowded with other categories like Art & Entertainment, College & Universities, and Profession offices.
There are five boroughs in New York. We are looking at the number of Indian restaurants in the all five boroughs. Based on the count and crowd of the restaurants, we can make decision on the location to open new restaurant.
Following factors will influence our decision:
Following data sources will be needed to extract/generate the required information:
import requests # Library to handle requests
import pandas as pd # Library for data analsysis
import numpy as np # Library to handle data in a vectorized manner
import random # Library for random number generation
from geopy.geocoders import Nominatim # The module to convert an address into latitude and longitude values
import folium # plotting library
from sklearn.cluster import KMeans # For unsupervised clustering
The New York city has five boroughs.
The more info about the borough can be found in the Wikipedia.
Let's first find the latitude & longitude of all the five boroughs using geopy library.
# Creating an empty dataframe to collect Borough's details
df_borough = pd.DataFrame(columns=['Borough','Latitude','Longitude'])
boroughs = ['The Bronx','Brooklyn','Manhattan','Queens','Staten Island']
for borough in boroughs:
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(borough + ', New York, NY')
lat = location.latitude
long = location.longitude
df_row = {'Borough':borough,'Latitude':lat,'Longitude':long}
df_borough = df_borough.append(df_row,ignore_index=True)
df_borough
| Borough | Latitude | Longitude | |
|---|---|---|---|
| 0 | The Bronx | 40.846651 | -73.878594 |
| 1 | Brooklyn | 40.650104 | -73.949582 |
| 2 | Manhattan | 40.789624 | -73.959894 |
| 3 | Queens | 40.749824 | -73.797634 |
| 4 | Staten Island | 40.583456 | -74.149605 |
Based on the information from Wikipedia and a visual observation, The radius for all the Borough will be specified in the dataframe.
df_borough['Radius'] = [7000, 10000, 10000, 12000, 12000]
df_borough
| Borough | Latitude | Longitude | Radius | |
|---|---|---|---|---|
| 0 | The Bronx | 40.846651 | -73.878594 | 7000 |
| 1 | Brooklyn | 40.650104 | -73.949582 | 10000 |
| 2 | Manhattan | 40.789624 | -73.959894 | 10000 |
| 3 | Queens | 40.749824 | -73.797634 | 12000 |
| 4 | Staten Island | 40.583456 | -74.149605 | 12000 |
Using the above co-ordinates, a circle will be created for all the boroughs. This will help us to understand the search radius for the restarurants.
latitude_ny = df_borough['Latitude'].sum()/5
longitude_ny = df_borough['Longitude'].sum()/5
# We are using geojson data to display border along all the boroughs
ny_boroughs_url = 'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/new-york-city-boroughs.geojson'
ny_boroughs = requests.get(ny_boroughs_url).json()
def boroughs_style(feature):
return { 'color': 'black', 'fill': False }
# Creating map of the New York city using latitude and longitude values
map_newyork = folium.Map(location=[latitude_ny, longitude_ny], zoom_start=10)
# add markers to the map
for lat, lng, label,rad in zip(df_borough['Latitude'], df_borough['Longitude'], df_borough['Borough'], df_borough['Radius']):
label = folium.Popup(label, parse_html=True)
folium.Circle([lat, lng],radius=rad,color='blue',fill=False).add_to(map_newyork)
folium.Marker([lat, lng], popup=label).add_to(map_newyork)
folium.GeoJson(ny_boroughs, style_function=boroughs_style, name='geojson').add_to(map_newyork)
map_newyork